RFC: Environment Wrap Actions#130
Conversation
| intercept and wrap the lifecycle actions of *inner* environments and tasks. The runtime | ||
| supplies each wrap action with the wrapped action's command, args, timeout, cancelation | ||
| method, and environment variables as template variables. A companion opt-out, | ||
| `runOnHost: true` on `<Action>`, lets individual actions bypass wrapping when they |
There was a problem hiding this comment.
Could we make this runOnHost also be controlled by the wrapping environment? Then an author of an external environment could have more control over this. If we leave this as is we might want to find a different name, because the distinguishing factor is that it skips the wrapping - both wrapped and non-wrapped actions are running on the host.
There was a problem hiding this comment.
Interesting, unless the wrapping environment inspects and has logic for every line in the wrapped action?
Something like exporting environment variables on the host would be one case that is hard to detect
(And agree on the naming is hard)
| `runOnHost: true` on `<Action>`, lets individual actions bypass wrapping when they | ||
| must run on the host (credential fetching, mount setup, cleanup that must always run). | ||
|
|
||
| The primary motivation is container support: a Docker or Apptainer environment template |
There was a problem hiding this comment.
This is the primary motivation, but it's worth mentioning that we want this to be a general composable feature to the extent possible. If we can avoid all references to containers in the structure of the extension (vs the docs and examples which should have many), I think that will be useful.
| onWrapTaskRun: | ||
| command: "bash" | ||
| args: ["{{Env.File.WrapTaskRun}}"] | ||
| timeout: "{{Task.Timeout}}" |
There was a problem hiding this comment.
The difference between Env.Wrapped.Timeout and Task.Timeout feels unnecessary. Switching to either Task.Wrapped.Timeout, or Env.Timeout would make it consistent. Since the timeout is on the action itself, maybe it could be Task.Action.Timeout and Env.Action.Timeout?
We should also think about the general forwarding mechanism. How will we forward #118 when we make that? Can there be a general mechanism, or will we give the template author responsibility to plumb through the features?
There was a problem hiding this comment.
Interesting, I did not see 118, let me read and incorporate it.
| ### Apptainer environment template | ||
|
|
||
| The same pattern applies to Apptainer (daemonless, each wrap hook invokes | ||
| `apptainer exec` directly rather than exec'ing into a running container). See |
There was a problem hiding this comment.
Can't it run as a daemon? This seems necessary for many use cases.
There was a problem hiding this comment.
I think daemonless was meaning the "apptainer" daemon, like the persistent "docker daemon", not necessarily referring to the container as a daemon. Maybe a framing difference here.
| [Appendix A: Apptainer environment template](#appendix-a-apptainer-environment-template) | ||
| for the full example. | ||
|
|
||
| ### Job template that works with any environment |
There was a problem hiding this comment.
It should just be able to point to our existing samples. Maybe add a technical requirement that the way we've been writing jobs already must work with or without wrapping to the extent we can do that.
I guess the purpose of this is to show the runOnHost option?
There was a problem hiding this comment.
Yeah - exactly. The RunOnHost was an escape hatch.
Lets chat offline how we can offer the escape hatch. Either within the hook, or as an explicit declaration.
| often where performance problems hide. | ||
|
|
||
| 5. **Privilege isolation.** Run inner actions as a different user or with reduced | ||
| capabilities by wrapping the command with `sudo -u`, `unshare`, or a jailed shell. |
There was a problem hiding this comment.
If the inner environments and task runs can select runOnHost then this use case is weaker.
There was a problem hiding this comment.
True, the runOnHost escape hatch would change this. But nonetheless, examples such as running a container would be able to use the sudo -u.
I'll re-frame this as a addendum use case, not necessarily an argument.
| 4. **Cross-OS wrapping.** The same-path bind-mount requirement assumes the host and | ||
| the wrapped execution context share path-separator conventions (Linux host with | ||
| Linux container, or Windows with Windows). Cross-OS wrapping (e.g., a Windows host | ||
| launching a Linux container) is not supported by this RFC. |
There was a problem hiding this comment.
The responsibility of the RFC would be to provide all information necessary to do this Cross-OS wrapping, not to assist in any way. I think it does that?
There was a problem hiding this comment.
True, let me delete it.
Along the way I was thinking - why wrap windows containers in linux, and vice versa. Although linux containers on windows absolutely work but is terrible thorugh WSL.
Was originall thinking to cut some scope or defer this family of cases for later.
|
|
||
| ### Security: container isolation boundaries | ||
|
|
||
| Each container instance should be scoped to a single security boundary. In the |
There was a problem hiding this comment.
The spec isn't about containers, it's about the ability to perform action wrapping reliably. I think the scheduler doesn't have any responsibility about "container," its responsibility is purely about providing all the specified arguments to the wrap* actions. The RFC should focus exclusively on the wrapping part where it is about the OpenJD behavior, and talk about containers in example contexts that are about containers.
There was a problem hiding this comment.
Sounds good - I'll do a pass and move this into my separate design doc instead.
| 2. The container does not run with elevated privileges (`--privileged`) unless explicitly | ||
| configured by the environment template author. | ||
| 3. Bind mounts are scoped to the session working directory and explicitly declared paths, | ||
| not the entire host filesystem. |
There was a problem hiding this comment.
None of these are scheduler responsibilities - they are the responsibility of the environment implementer writing support for containers. The spec should ensure that the environment has sufficient capabilities to do what it needs to do.
There was a problem hiding this comment.
True, I think here, AI here thought "deadline" overall as the scheduler, I'll delete these.
| ``` | ||
|
|
||
| 1. *onEnter* — The action to run when entering the environment. | ||
| 2. *onWrapEnter* — If provided, this action is run instead of the `onEnter` action of |
There was a problem hiding this comment.
Consider adding an all or nothing rule. Either all onWrap* must be included, or none of them. This would help against accidentally implementing just part of the wrapping, then getting hard to debug results because not everything is running where expected.
There was a problem hiding this comment.
Good suggestion! I'll add that into the spec.
While the template yaml / json is slightly larger it is much harder to do the wrong thing.
Signed-off-by: David Leong <116610336+leongdl@users.noreply.github.com>
Update for RFC 0008 reviewersThe RFC has been substantially revised and synthesized into the 2023‑09 spec and wiki. Highlights since the last review: Scope expanded from one hook to three. The proposal now wraps the full environment lifecycle, not just task run:
The file was renamed accordingly: New hard constraints, enforced before session entry:
New template‑variable scopes:
Schedulers must reject templates that reference these outside their allowed hooks. Timeout semantics clarified. A wrap hook's Exit‑code 125 reserved for infrastructure failure (Docker daemon unreachable, image pull failed, SSH refused, etc.). Other non‑zero exits are workload failures. Schedulers may use this to drive retry / worker‑eviction / alerting. Stdout macro handling specified. Schedulers scan the wrap script's stdout for OpenJD macros ( Synthesized into the normative docs:
Re‑review focus suggestions: the all‑or‑nothing + single‑layer rules, the |
| onWrapEnter: | ||
| command: bash | ||
| args: ["{{Env.File.Wrap}}"] | ||
| timeout: "{{Env.Wrapped.Action.Timeout}}" |
There was a problem hiding this comment.
Thinking some more about the names for the wrapped action variables, these two properties would be useful:
- They have exactly the same name across all of the
onWrap*actions. - They're compatible with a future introspection where the action can reference itself, probably via
Action.*.
Based on that, instead of Env.Wrapped.Action.* and Task.Action.*, the names WrappedAction.* is a good candidate to use uniformly.
There was a problem hiding this comment.
Sounds good, I'll refactor to just WrappedAction.*
| #!/usr/bin/env bash | ||
| set -euo pipefail | ||
| DOCKER_CONTAINER_ID=$(docker container run --rm --detach \ | ||
| --mount 'type=bind,src={{Session.WorkingDirectory}},dst={{Session.WorkingDirectory}}' \ |
There was a problem hiding this comment.
'type=bind,src={{Session.WorkingDirectory}},dst={{Session.WorkingDirectory}}'
For this part we'll want it to be inside of a repr_sh instead of in single quotes. Does that need a let binding or is there a different good way to represent it?
There was a problem hiding this comment.
I excluded f-strings from the EXPR extension, but maybe that's how it could be more pleasant to write?
There was a problem hiding this comment.
We can add f-format strings later since it is a programming improvements.
After going through more examples, I actually think let bindings is a great way, since we can print them out, not recompute the variable etc. This can be a great programmer guide and suggestion.
Will update with repr_sh, good suggestion. I guess it is useful mainly on CMF, since SMF we use a simpler directory.
| {{ repr_sh(Env.Wrapped.Action.Command) }} \ | ||
| {{ repr_sh(Env.Wrapped.Action.Args) }} | ||
|
|
||
| - name: WrapTask |
There was a problem hiding this comment.
Looks like Wrap and WrapTask could be the same script file if the wrapped action variable names are the same for this example.
There was a problem hiding this comment.
True - given the comment above, I'll align the variables to WrappedAction then this example can be shorter. Its still flexible to have specific scripts per hook, so usability wise it is good.
|
|
||
| ### Execution order with a wrapping environment | ||
|
|
||
| For a session with a wrapping Docker environment `A` and a non-wrapping inner step |
There was a problem hiding this comment.
Is it worth mentioning here that the wrapping environment doesn't have to be outermost? B could be the wrapping environment, then A runs like normal, B wraps the tasks' onRun.
There was a problem hiding this comment.
Yeah, that's definitely an option. As long as it meets our tenet of 1 wrapped action at most in the whole structure.
I'll add a 1 sentence to keep it short, and a conformance test as example.
| ``` | ||
|
|
||
| On a queue with no wrapping environment, `A` does not exist and every action runs | ||
| on the host. The job template and the inner environments are unchanged. |
There was a problem hiding this comment.
Instead of "on the host," maybe say "normally" since the wrapped actions typically run on the host as well.
There was a problem hiding this comment.
Yes - I have applied this change to remove "on the host" generally throughout this document.
| A.onExit → HOST | ||
| ``` | ||
|
|
||
| On a queue with no wrapping environment, `A` does not exist and every action runs |
There was a problem hiding this comment.
Open Job Description doesn't define what a queue is. Deadline Cloud queue environments are based on the environment template specification. Should reword to ensure only the OpenJD terms are being used.
There was a problem hiding this comment.
Good point :) Now a steering rule in my own skills!
|
|
||
| > Job templates should be portable in a way to run them, unmodified, with either a | ||
| > Conda, Rez, Docker, or Apptainer environment template that provides the software | ||
| > environment to run in. |
There was a problem hiding this comment.
The way this is written it looks like a quote from the tenets, but it's an example application of the portability tenet into this application. Because of that I would remove the > quoting from it to make it clear this text is primary not a quote.
| context (environment variables, timeout, enter/task/exit distinction) and | ||
| composes through the environment-template model. | ||
|
|
||
| ### Docker exec pattern in OpenJD |
There was a problem hiding this comment.
Suggest to delete this section, bash-in-docker was already referenced above.
| under `.Action.` maps to a named field of the wrapped `<Action>` object. This has | ||
| three concrete benefits: | ||
|
|
||
| 1. Symmetry between the task and environment branches (both use `.Action.`), |
There was a problem hiding this comment.
You can't copy/paste them though, I think matching the full names exactly is better.
| straightforward to iterate over in a list comprehension: | ||
|
|
||
| ```yaml | ||
| {{ repr_sh(flatten([['-e', e] for e in Task.Action.Environment])) }} |
There was a problem hiding this comment.
Can you check if e.split('=', 1) works as expected to be able to separately access the key and value?
|
|
||
| ## Security Considerations | ||
|
|
||
| ### Command injection |
There was a problem hiding this comment.
I would frame this more that the template author is responsible to ensure the template is correct about quoting all input, and our job in the spec is to provide primitives that enable them to do so. repr_sh is key to that.
There was a problem hiding this comment.
That is true - the spec can only give the best practice. Our examples is the demonstrator, but the user is the one is responsible to re-use our examples. Will do!
| would fit directly. An `onRun` near the 256 KB macOS `ARG_MAX`, for example, | ||
| may fail only when wrapped. | ||
|
|
||
| Implementations SHOULD validate command-line length before exec and produce |
There was a problem hiding this comment.
Who does this affect? It's unclear whether this is about the OpenJD implementation or the template?
| (the wrap environment) from the job-logic layer (the job template), rather than | ||
| annotating individual processes with a runtime identifier. | ||
|
|
||
| ### Universal Wrapper pattern |
There was a problem hiding this comment.
This seems like it's describing the idea of this spec, not prior art? I'd suggest to remove this section.
| exec'ing tasks into it. This RFC generalizes that pattern to work at any | ||
| environment level and extends it to cover inner-environment setup and teardown. | ||
|
|
||
| ### Conda and Rez queue environments |
There was a problem hiding this comment.
This section doesn't seem to describe precedent. Both conda and rez have run commands that could be used in a wrapping environment though, maybe that's worth mentioning here. Otherwise I'd delete this section.
| explicit wrap-hook approach keeps the specification general and the user in | ||
| control. | ||
|
|
||
| ### Global config toggle for runtime selection |
There was a problem hiding this comment.
Maybe merge this with the above section to tighten up the RFC (We're at line 923! I think it can be shorter).
There was a problem hiding this comment.
Makes sense, too much details :)
| > 3. *EXPR prerequisite.* A template that lists `WRAP_ACTIONS` in `extensions:` must also list `EXPR`. | ||
| > Schedulers must reject templates that list `WRAP_ACTIONS` without also listing `EXPR`. | ||
| > 4. The wrapping environment's own `onEnter` and `onExit` are never wrapped; they always run on the | ||
| > host. |
There was a problem hiding this comment.
Suggest "run normally" instead of "run on the host." You'll want to do a sweep for this one throughout.
| The host uses the return code of the **command** run to determine success or failure of the Action. A zero exit code | ||
| indicates success, and any non-zero exit code indicates failure. A timeout also indicates failure. | ||
|
|
||
| When the WRAP_ACTIONS extension is in use, wrap scripts should signal infrastructure failure (Docker |
There was a problem hiding this comment.
Do we already have this kind of signal defined elsewhere? This doesn't feel specific to wrap actions, I think we'd either exclude this, or define this as a general thing that all actions can do.
There was a problem hiding this comment.
Interesting - I was thinking about timeouts and how to handle docker general failures such as cannot start shared device, and found this 125 behavior. We can definitely cut it for now, and if we have a specific need to distinguish a wrapper failure vs task failure we can introduce this.
(Seemed like an interesting add, but can also be a single sentence for the future to consider.)
| in the **Session** until the **Environment** that emitted it is exited. | ||
|
|
||
| When the WRAP_ACTIONS extension is in use, schedulers must scan the stdout of the wrap script (`onWrapEnter`, | ||
| `onWrapTaskRun`, or `onWrapExit`) for these macros, not the stdout of the wrapped process. Wrap scripts must |
There was a problem hiding this comment.
Do we need to specify this? The scheduler has no visibility to the inside process, there might be multiple or no inside processes at all. Because of that I don't think there are alternatives to what's being described.
There was a problem hiding this comment.
I think the word scheduler is confused here, how about I just say "forward the macros" so the wrapped process's messages are still emitted by the wrapper so we do not lose the content.?
|
|
||
| **Available in `onWrapEnter` and `onWrapExit`:** | ||
|
|
||
| | Variable | Type | Description | |
There was a problem hiding this comment.
We have decided to unify the variables as per above.
| - Variables set by the wrapping environment's own `onEnter` through any | ||
| channel other than `openjd_env` stdout macros. | ||
|
|
||
| Wrap scripts that need any of these variables in the wrapped execution |
There was a problem hiding this comment.
Try to shorten this paragraph and section, it is defining only forwarding semantics.
| channel other than `openjd_env` stdout macros. | ||
|
|
||
| Wrap scripts that need any of these variables in the wrapped execution | ||
| context MUST forward them explicitly. For Docker, this typically means |
There was a problem hiding this comment.
Docker is "example", no need to be so wordy. The RFC should be agnostic of docker.
| `Command` and `Args` may reference host filesystem paths, most commonly | ||
| embedded files (`{{Env.File.X}}` resolves to a host path), session working | ||
| directory entries, or job-attachment paths. The wrap environment MUST ensure | ||
| that the wrapped process can resolve every such path, typically by |
There was a problem hiding this comment.
We can shorten this - docker and containers is an example to use bind mount and maintain path equivalence.
| + - An inner Environment's onExit is replaced by the wrapping Environment's onWrapExit. | ||
| + | ||
| + The wrapping Environment's own onEnter and onExit are never wrapped; they always run | ||
| + on the host. If more than one Environment in the session stack defines any wrap hook, |
There was a problem hiding this comment.
nit, replace "on the host" as per above.
|
|
||
| ### Stdout forwarding and macro propagation | ||
|
|
||
| The wrap script is the scheduler's direct child; the wrapped process is a |
There was a problem hiding this comment.
Double check the use of "scheduler" here - does the session runner make more sense?
|
|
||
| Wrap scripts MUST forward the wrapped process's stdout and stderr to their own | ||
| stdout and stderr verbatim, without buffering, filtering, or transformation. | ||
| This is the default behavior of `docker container exec`, `apptainer exec`, and |
|
|
||
| #### Macro propagation (contract) | ||
|
|
||
| Schedulers MUST scan the wrap script's stdout for OpenJD stdout macros: the |
There was a problem hiding this comment.
NOt the scheduler - this should be transparently propagated via logging as per previous section. The OJD session runtime detects the openjd_ macros and applies them normally.
Given the wrapper, logs are pass through, and wrapped task's macro usage are unchanged and applied to the job as per the spec.
| `openjd_env`, `openjd_fail`, `openjd_progress`, and `openjd_status` macros | ||
| defined in [How Jobs Are Run](https://github.com/OpenJobDescription/openjd-specifications/wiki/How-Jobs-Are-Run), | ||
| along with any future macros added by later specifications. Schedulers MUST | ||
| NOT scan the grand-child's stdout directly. This preserves the existing |
There was a problem hiding this comment.
Rephrase or remove this - the word scheduler and scan here does is not correct.
| - Variables emitted by wrapped actions (an inner environment's `onEnter` | ||
| running via `onWrapEnter`, or earlier wrapped task `onRun` invocations). | ||
|
|
||
| This is the chain that makes wrap composition useful: an inner Conda |
| [Basic Example](#basic-example) uses this pattern to publish the container | ||
| ID for use in subsequent wrap hooks. | ||
|
|
||
| #### Alternatives considered |
There was a problem hiding this comment.
Just remove the alternative considered for openjd_env propagation. Move it to out of scope above and propose current semantics of the spec apply and pass through the wrap. (Or consider the use of transitive?)
|
|
||
| 1. **Workload failure**: the wrapped action itself exited non-zero. The | ||
| job logic is at fault. | ||
| 2. **Infrastructure failure**: the wrap script could not deliver the |
There was a problem hiding this comment.
As per discussion, we combine infrastructure failure and workfload failure. Return non-zero and keep it simple. The exit code 125 can be moved to a future consideration if separating the failure mode is important.
| Finer-grained classification (network vs. daemon vs. image-pull) is deferred | ||
| to [Future Work](#future-work). | ||
|
|
||
| #### Scheduler policy |
There was a problem hiding this comment.
We can remove this section #### scheduler policy since it is superfluous.
| stops the container, closes the remote session, releases the bind-mount, or | ||
| otherwise releases the host resources allocated by `onEnter`. | ||
|
|
||
| **Summary of run-on-failure guarantees:** |
There was a problem hiding this comment.
Lead with this table and keep each description above shorter. Can make it easier to consume as a reader.
|
|
||
| Any of the following idioms satisfies the requirement: | ||
|
|
||
| - Shell wrap scripts SHOULD `trap` SIGTERM and forward it, or `exec` the |
There was a problem hiding this comment.
Can we simplify this section - signal and error code propagation is the key and focus The inner task or enter exit fails, it is propagated by the wrap script. Keep 1 example with a sig term, and 1 with docker. Make this shorter.
| the wrap script to the wrapped process is normative and specified in | ||
| [Cancelation behavior](#cancelation-behavior). | ||
|
|
||
| ## Design Choice Rationale |
There was a problem hiding this comment.
This can be an appendix. Its good for discussion, maybe can be deleted in the final version.
…d conformance suite Address PR OpenJobDescription#130 review feedback. Collapse Env.Wrapped.Action.* and Task.Action.* into a single WrappedAction.* namespace shared by all three wrap hooks; this lets a single helper script be reused across hooks and composes cleanly with future Action.* self-introspection. Drop the duplicate Out-of-scope section, the inline timeout-rationale comment in the Basic Example, and Docker-specific examples in the Host-environment section. Replace "queue"/"on the host" wording per OpenJD terminology. Note that the wrapping environment is not required to be the outermost. Collapse near-identical wrap scripts in the appendix templates (Docker 4→2, Apptainer 3→1, SSH 2→1, Profiling 2→1); EchoWrap keeps three to demonstrate per-hook customization. Add the WRAP_ACTIONS conformance suite: 21 valid .test.yaml + 7 .invalid fixtures covering interception of all three hooks, the all-or-nothing rule, the single-wrap-layer rule, the variable-scope rule, extension gating, openjd_env propagation, and the shell-quoting security cases from the implementation considerations.
| command: sh | ||
| args: | ||
| - "-c" | ||
| - "{{ repr_sh(WrappedAction.Command) }} {{ repr_sh(WrappedAction.Args) }}" |
There was a problem hiding this comment.
Curious if this should "fail" - since the enter is not wrapped.
There was a problem hiding this comment.
It looks valid to me because that's syntax check, not a runtime check that the template it's applied against actually has an onEnter.
| This RFC is additive and gated by the `WRAP_ACTIONS` extension name declared under | ||
| RFC 0002. Specifically: | ||
|
|
||
| - Schedulers that do not implement `WRAP_ACTIONS` MUST reject templates that list |
There was a problem hiding this comment.
Curious about the use of the word "schedulers", is there a specific verb we call openJD runtimes?
| wrap hook's `<Cancelation>` policy (see | ||
| [Cancelation behavior](#cancelation-behavior)). | ||
|
|
||
| ## Design Choice Rationale |
There was a problem hiding this comment.
Curious if we would like to keep this section - for brevity.
| # onEnter, onWrapEnter, onWrapTaskRun, onWrapExit, or onExit must be defined. | ||
| specificationVersion: environment-2023-09 | ||
| extensions: | ||
| - WRAP_ACTIONS |
There was a problem hiding this comment.
I think this should already fail because EXPR is missing, so it doesn't test what the comment says.
| args: ["entered"] | ||
| onWrapEnter: | ||
| command: echo | ||
| args: ["wrap-enter", "{{Env.Wrapped.Name}}"] |
There was a problem hiding this comment.
Going with WrappedEnv.Name (and likely WrappedStep.Name inside onWrapTaskRun is probably more consistent with the WrappedAction convension.
| - "{{ repr_sh(WrappedAction.Command) }} {{ repr_sh(WrappedAction.Args) }}" | ||
|
|
||
| runOn: | ||
| - posix |
There was a problem hiding this comment.
If this isn't posix-specific, better to run on all OS.
| script: | ||
| actions: | ||
| onWrapEnter: | ||
| command: sh |
There was a problem hiding this comment.
Suggestion change this to bash as that's what we generally have on Windows as well.
| command: echo | ||
| args: ["ORIGINAL_TASK"] | ||
|
|
||
| environments: |
There was a problem hiding this comment.
Something to double check: Make sure we're testing with wrapping environment inside the job (both as job environments and step environments) as well. I'm not very far in so good chance they're later!
| lifecycle actions of inner environments and tasks with before/after logic | ||
| provided by the outer environment. | ||
|
|
||
| ### Workflow DSLs with container abstraction |
There was a problem hiding this comment.
Maybe we skip this part, it's not actually about wrapping.
| (the wrap environment) from the job-logic layer (the job template), rather than | ||
| annotating individual processes with a runtime identifier. | ||
|
|
||
| ### Conda and Rez run commands |
There was a problem hiding this comment.
Could include Docker here too, its command is similar.
|
|
||
| ## Rejected Ideas | ||
|
|
||
| ### Extending `<Action>` with a `wrapper` field |
There was a problem hiding this comment.
Consider dropping this one.
| Stacking multiple wrap environments (for example, a profiling wrapper outside a | ||
| container wrapper) is valuable but adds implementation complexity. Deferred. | ||
|
|
||
| ### Wrap-aware cancelation |
There was a problem hiding this comment.
If this isn't too bad to implement, maybe we want to add this in?
| from the wrap mechanic. A separate RFC for environment health monitoring is the | ||
| appropriate venue. | ||
|
|
||
| ### Cross-OS wrapping |
There was a problem hiding this comment.
Consider dropping this one.
PR for RFC
Tracking Issue: #132
This is a request for comments about RFC 0008 — Environment Wrap Actions, which
extends
<Environment>with three new session actions —onWrapEnter,onWrapTaskRun, andonWrapExit— that let an environment template intercept andwrap the lifecycle actions of inner environments and tasks. A companion opt-out,
runOnHost: trueon<Action>, lets individual actions bypass wrapping when theymust run on the host (credential fetching, mount setup, cleanup that must always
run).
The primary motivation is portable container support: a Docker or Apptainer
environment template can start a container in
onEnter, route every innerenvironment's
onEnter/onExitand every task'sonRuninto the container via thethree wrap hooks, and stop the container in
onExit. Job templates and innerenvironments remain portable across Conda, Rez, Docker, and Apptainer. The design
also generalizes to remote execution, session-wide instrumentation, and privilege
isolation.
This RFC is gated by the new
WRAP_ACTIONSextension and depends on RFC 0002(Model Extensions), RFC 0005 (Expression Language), and RFC 0006 (Expression
Function Library — for
repr_sh()and friends).By submitting this pull request, I confirm that you can use, modify, copy, and
redistribute this contribution, under the terms of your choice.